AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Video Text Understanding

# Video Text Understanding

Vica2 Init
Apache-2.0
ViCA2 is a multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.
Video-to-Text Transformers English
V
nkkbr
30
0
Vica2 Stage2 Onevision Ft
Apache-2.0
ViCA2 is a 7B-parameter multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.
Video-to-Text Transformers English
V
nkkbr
63
0
Videochat R1 Thinking 7B
Apache-2.0
VideoChat-R1-thinking_7B is a multimodal model based on Qwen2.5-VL-7B-Instruct, focusing on video-text-to-text tasks.
Video-to-Text Transformers English
V
OpenGVLab
800
0
Videochat TPO
MIT
A multimodal large language model developed based on the paper 'Task Preference Optimization: Improving Multimodal Large Language Models through Visual Task Alignment'
Text-to-Video Transformers
V
OpenGVLab
18
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase